Eecient Parallel Classiication Using Dimensional Aggregates

نویسندگان

  • Sanjay Goil
  • Alok Choudhary
چکیده

Multidimensional aggregates are frequently computed to improve query performance in Online Analytical Processing applications. We present a new method for decision tree based classiication trees using the aggregates computed in the multidimensional data model. The structure imposed on data in a explicit multidimensional storage mechanism leads to eecient dimensional operations. Decision tree based classiication algorithms perform computations to nd the best split point at each node of the tree. EEcient computation of the split in the decision tree can be done by using the one-dimensional aggregates if the cell values are the class-id values, and counts are maintained for each class. This is used repeatedly at the nodes of the decision tree to calculate splits and manage data. Previous parallel approaches for decision-tree based classiication use sorted attribute lists and hash tables to compute the split point and split the data appropriately. The amount of data communicated is proportional to the product of number of records in the training set, and the number of dimensions, at each level of the tree, in the worst case. Parallel formulation of our approach uses data communication proportional to the product of the sum of cardinality of all dimensions and the number of non-classiied nodes at each level of the tree. Communication volume is greatly reduced in our approach and is done in one phase of communication at each level of the tree, by coalescing messages. Preliminary results from our experiments on a coarse-grained, distributed memory parallel machine (IBM-SP2) show good performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Eecient Recursive Partitioning Algorithm for Classiication, Using Wavelets

We describe and analyze a new dyadic recursive partitioning algorithm for eecient classiication of large two-dimensional data sets, called progressive classiication. It uses generic (parametric or nonparametric) classiiers on a low-resolution representation of the data obtained using the discrete wavelet transform. In this representation, each point corresponds to a block of samples from the or...

متن کامل

An Analytical Method Forparallelization of Recursive

Received (received date) Revised (revised date) Communicated by Christian Lengauer ABSTRACT Programming with parallel skeletons is an attractive framework because it encourages programmers to develop eecient and portable parallel programs. However, extracting parallelism from sequential speciications and constructing eecient parallel programs using the skeletons are still diicult tasks. In this...

متن کامل

Parallel Csg, Skeletons and Performance Modelling 1

We describe an eecient implementation of a parallel algorithmic skeleton which supports set membership classiication problems in Constructive Solid Geometry. A performance modelling methodology is developed, which realistically predicts the asymptotic performance of speciic CSG applications.

متن کامل

A Scalable Bit - Sequential SIMD Array for Nearest - NeighborClassi cation using the City - Block

We present a fully scalable SIMD array architecture for a most eecient implementation of pattern classiication by nearest-neighbor algorithms using the city-block metric. The elementary accumulator cell is highly optimized for a sequential accumulation of absolute integer diierences, so that several hundreds of them can be easily integrated on a single chip. A two-dimensional M N array structur...

متن کامل

A Distributed Shared Memory System with Self-Adjusting Coherence Scheme

The performance of distributed shared memory depends on the memory coherence algorithms and the access characteristics of shared data. In this paper, we propose an eecient coherence scheme using multiple coherence algorithms with self-adjusting feature. Our method can dynamically choose a more adaptive coherence algorithm for each variable class and the incorrect classiication of shared variabl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999